160 research outputs found

    Is it time to stop sweeping data cleaning under the carpet?:A novel algorithm for outlier management in growth data

    Get PDF
    All data are prone to error and require data cleaning prior to analysis. An important example is longitudinal growth data, for which there are no universally agreed standard methods for identifying and removing implausible values and many existing methods have limitations that restrict their usage across different domains. A decision-making algorithm that modified or deleted growth measurements based on a combination of pre-defined cut-offs and logic rules was designed. Five data cleaning methods for growth were tested with and without the addition of the algorithm and applied to five different longitudinal growth datasets: four uncleaned canine weight or height datasets and one pre-cleaned human weight dataset with randomly simulated errors. Prior to the addition of the algorithm, data cleaning based on non-linear mixed effects models was the most effective in all datasets and had on average a minimum of 26.00% higher sensitivity and 0.12% higher specificity than other methods. Data cleaning methods using the algorithm had improved data preservation and were capable of correcting simulated errors according to the gold standard; returning a value to its original state prior to error simulation. The algorithm improved the performance of all data cleaning methods and increased the average sensitivity and specificity of the non-linear mixed effects model method by 7.68% and 0.42% respectively. Using non-linear mixed effects models combined with the algorithm to clean data allows individual growth trajectories to vary from the population by using repeated longitudinal measurements, identifies consecutive errors or those within the first data entry, avoids the requirement for a minimum number of data entries, preserves data where possible by correcting errors rather than deleting them and removes duplications intelligently. This algorithm is broadly applicable to data cleaning anthropometric data in different mammalian species and could be adapted for use in a range of other domains

    Management factors associated with seropositivity to Lawsonia intracellularis in US swine herds.

    Get PDF
    abstract: This study was conducted to determine risk factors for Lawsonia intracellularis seropositivity in the breeding and grower-finisher units of US farrowing-to-finishing swine herds. Serum was collected from 15 breeding females and 15 grower-finisher pigs per herd in 184 farrow-to-finish herds, a subset of 405 herds in the National Animal Health Monitoring System (NAHMS) Swine 1995 Study that examined management, health and productivity in herds with at least 300 finisher pigs. Sera were tested by indirect fluorescent antibody test for L. intracellularis. Test results were linked with NAHMS questionnaire data and a logistic regression model of management factors associated with L. intracellularis serological status was developed. Separate models were used for breeding and grower-finisher units. Risk factors for seropositive breeding units were L intracellularis-seropositive status of the grower-finisher unit, use of a continuous system of management for the farrowing unit and a young parity structure (<75% multiparous sows). Risk factors for seropositive grower-finisher units were L. intracellularis-seropositive status of the breeding unit, the number of pigs entering the grower-finisher stage, raising pigs on concrete slats, and intensive management compared with raising pigs on outdoor lots. Use of all in-all out management in the farrowing house and an older parity structure in the sow herd were associated with a lower risk of L. intracellularis seropositivity in the breeding unit, and slatted concrete flooring in grower-finisher houses was associated with a greater risk. Alteration of these management factors might improve control of L. intracellularis infection in farrowing-to-finishing herds
    • 

    corecore